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3-D RECURSIVE VECTOR ESTIMATION FOR VIDEO ENHANCEMENT 



The present invention is directed, in general, to 



maintaining spatio-temporal consistency during video 
enhancement . 



Many contemporary high performance televisions, 
particularly large screen and wide screen versions, utilize 
a spatial-temporal resolution which is higher than the 
normal resolution and refresh rate. For example, a 100 
Hertz (Hz) screen refresh rate may be employed for the 
television display rather than the standard 50 or 60 Hertz. 
However, because the field rate—the number of interlaced 
screen images or "fields" for the television--within the 
program signal received will typically be only 50 fields 
per second, the number of fields for display must be 
doubled. 




TECHNICAL FIELD OF THE INVENTION 



video enhancement systems and, 



more specifically, to 



BACKGROUND OF THE INVENTION 
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For digital televisions employing a field memory — a 
memory with the capacity to store a digitized version of a 
complete television field—one technique for doubling the 
field rate involves simply writing to the field memory at a 
first rate and reading from the field memory at a second 
rate which is double the first rate. However, such field 
rate up-conversion by simple field repetition results in 
each movement phase (i.e., frame) being displayed multiple 
times, with moving objects appearing slightly displaced 
from their expected spatio-temporal (space-time) position 
in the repeated movement phases as illustrated in FIGURE 5. 

The space-time positioning 501a, 501b and 501c of an 
object moving linearly across the screen within a sequence 
of three fields n-2, n-1 and n is shown in FIGURE 5. Field 
rate up-conversion by field repetition produces inter- 
mediate fields (not labeled) in which the space-time 
positioning of the object is 503a, 503b and 503c rather 
than the expected space-time positioning of 502a, 502b, and 
502c. 

While the displacement is almost unnoticeable to the 
human eye at video information captured at normal field 
rates (50-60 Hz) employed by video cameras and the like, 
motion picture cameras have, for historical electro- 
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mechanical reasons, operated at a capture rate of 24 frames 
per second. While modern motion picture cameras have been 
improved, much film exists which was recorded at that 
previously-standard capture rate. Such film is normally 
converted for television display by running the film at 
approximately 25 frames per second and then scanning each 
frame twice such that adjacent pairs of identical fields 
are created within the video information. 

When up-converting a television formatted motion 
picture to a higher field rate utilizing simple field 
repetition, the already duplicated fields are again 
duplicated, creating sequences of four identical fields 
within the video information and resulting in a significant 
amount of motion jitter and picture blurring. To address 
these problems, motion compensation techniques such as 
three dimensional (3-D) recursive search block matching 
have been developed to provide motion-compensated inter- 
polation. See, for example, G. de Haan, Motion Estimation 
and Compensation - An Integrated Approach to Consumer 
Display Field Rate Conversion (ISBN 90-74445-01-2) and G. 
de Haan et al, "True-Motion Estimation with 3-D Recursive 
Search Block Matching," IEEE Tr. On Circuits and Systems 
for Video Technology, 3(5):368-379 (October 1993). 
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High definition television (HDTV) often imposes a 
requirement differing from— and either in addition to or in 
lieu of— field rate up-conversion : image resolution 
enhancement. As illustrated in FIGURES 6A and 6B, image 
resolution enhancement requires up-conversion from one 
resolution and the corresponding pixel size 601a and/or 
pixel density 602a to a higher resolution having a smaller 
pixel size 601b and/or greater pixel density 602b. Known 
interpolation techniques are employed to generate the 
additional pixels required from the original video 
information . 

As known in the art, the shape or magnitude of edges 
within an image significantly contribute to the overall 
impression of "sharpness" for the image. Accordingly, 
various edge enhancement techniques such as frequency 
peaking and luminance transient improvement (LTI) have been 
developed for use during image resolution enhancement. 
Frequency peaking involves linear boosting or "peaking" of 
selected spatial frequencies within the image, often with a 
bandpass or highpass filter to enhances the associated 
spatial frequencies and with adaptive control to avoid 
"unnaturalness" relating to, for example, peaking large and 
steep edges. Unlike frequency peaking, luminance transient 
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improvement preserves the magnitude of the edge but 
increases the steepness of the edge, "pulling" samples near 
the edge on both sides towards the edge. 

Existing edge enhancement algorithms enhance the 
sharpness of an image based on the spatial information of 
the. original image, often utilizing control parameters 
determined by a small spatial neighborhood of a given pixel 
position. While these techniques are generally sufficient 
for still images, time varying conditions within video 
information such as (but not limited to) noise, motion, or 
lighting conditions, or even spatio-temporal varying 
conditions, may cause annoying artifacts in the processed 
video information. Conservative tuning of the parameters 
may prevent such artifacts, but also constrains the 
enhancement . 

There is, therefore, a need in the art for enhancement 
of video information with spatio-temporal consistency, or 
consistency of enhanced image data both with spatially 
surrounding (enhanced) image data in the field containing 
the enhanced image data and with counterpart or 
corresponding image data within subsequent fields. 
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SUMMARY OF THE INVENTION 



To address the above-discussed deficiencies of the 
prior art, it is a primary object of the present invention 
to provide, for use in a video signal processor, a 
technique for enhancing video information which evaluates 
candidate vectors of enhancement algorithms utilizing an 
error function biased towards spatio-temporal consistency 
with a penalty function. The penalty function increases 
with the distance--both spatial and temporal--of the 
subject block from the block for which the candidate vector 
was optimal. Enhancements are therefore gradual across 
both space and time and the enhanced video information is 
intrinsically free of perceptible artifacts. 

The foregoing has outlined rather broadly the features 
and technical advantages of the present invention so that 
those skilled, in the art may better understand the detailed 
description of the invention that follows. Additional 
features and advantages of the invention will be described 
hereinafter that form the subject of the claims of the 
invention. Those skilled in the art will appreciate that 
they may readily use the conception and the specific 
embodiment disclosed as a basis for modifying or designing 



US010028APP.doc 



6 



PATENT 



other structures for carrying out the same purposes of the 



realize that such equivalent constructions do not depart 
from the spirit and scope of the invention in its broadest 
form. 

Before undertaking the DETAILED DESCRIPTION OF THE 
INVENTION below, it may be advantageous to set forth 
definitions of certain words or phrases used throughout 
this patent document: the terms "include" and "comprise," 
as well as derivatives thereof, mean inclusion without 
limitation; the term "or" is inclusive, meaning and/or; the 
phrases "associated with" and "associated therewith," as 
well as derivatives thereof, may mean to include, be 
included within, interconnect with, contain, be contained 
within, connect to or with, couple to or with, be 
communicable with, cooperate with, interleave, juxtapose, 
be proximate to, be bound to or with, have, have a property 
of, or the like; and the term "controller" means any 
device, system or part thereof that controls at least one 
operation, whether such a device is implemented in 
hardware, firmware, software or some combination of at 
least two of the same. It should be noted that the 
functionality associated with any particular controller may 



present invention . 



Those skilled in the 



art will also 
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be centralized or distributed, whether locally or remotely. 
Definitions for certain words and phrases are provided 
throughout this patent document, and those of ordinary 
skill in the art will understand that such definitions 
apply in many, if not most, instances to prior as well as 
future uses of such defined words and phrases. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

For a more complete understanding of the present 
invention, and the advantages thereof, reference is now 
made to the following descriptions taken in conjunction 
with the accompanying drawings, wherein like numbers 
designate like objects, and in which: 

FIGURE 1 depicts a system in which video enhancement 
with spatio-temporal consistency is implemented according 
to one embodiment of the present invention; 

FIGURE 2 illustrates in greater detail a system for 
video enhancement with spatio-temporal consistency 
according to one embodiment of the present invention; 

FIGURE 3 illustrates a logical organization of video 
information for video enhancement with spatio-temporal 
consistency according to one embodiment of the present 
invention; 

FIGURE 4 is a high level flow chart for a process of 
video enhancement with spatio-temporal consistency 
according to one embodiment of the present invention; 

FIGURE 5 is an illustration of displacement of a 
moving object from an expected position as a result of 
field rate conversion through field repetition; and 
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FIGURES 6A and 6B are comparative ' illustrations for 
spatial resolution enhancement. 



: 

o 
o 
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DETAILED DESCRIPTION OF THE INVENTION 

FIGURES 1 through 4, discussed below, and the various 
embodiments used to describe the principles of the present 
invention in this patent document are by way of 
illustration only and should not be construed in any way to 
limit the scope of the invention. Those skilled in the art 
will understand that the principles of the present 
invention may be implemented in any suitably arranged 
device. 

FIGURE 1 depicts a system in which video enhancement 
with spatio-temporal consistency is implemented according 
to one embodiment of the present invention. System 100 
includes a receiver 101, which in the exemplary embodiment 
is a high definition digital television (HDTV) large-screen 
or wide-screen television receiver. Alternatively, how- 
ever, receiver 101 may be an intermediate transceiver or 
any other device employed to receive or transceive video 
signals, as for example a transceiver retransmitting video 
information for reception by a high definition television. 
In any embodiment, receiver 101 includes a video enhance- 
ment mechanism as described in further detail below. 
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Receiver 101 includes an input 102 for receiving video 
signals and may optionally include an output 103 for 
transmitting enhanced video signals to another device. In 
the exemplary embodiment, receiver 101 includes a high 
definition television display 104 upon which images 
rendered or otherwise generated according the enhanced 
video information are displayed. 

Those skilled in the art will perceive that FIGURE 1 
does not explicitly depict all components within the high 
definition television receiver of the exemplary embodiment. 
Only so much of the commonly known construction and 
operation of a high definition television receiver and the 
components therein as are unique to the present invention 
and/or required for an understanding of the present 
invention are shown and described herein. 

FIGURE 2 illustrates in greater detail a system for 
video enhancement with spatio-temporal consistency 
according to one embodiment of the present invention. 
Receiver 101 includes a video signal processor 201, which 
may be implemented by a single integrated circuit device or 
a combination of integrated circuit devices. Video signal 
processor 201 includes an enhancement vector estimator 202 
and enhancement processor 203 which perform the video 
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enhancement processing. Video signal processor 201 in the 
exemplary embodiment is the device from which the enhanced 
video output is transmitted either to display 104 or to a 
storage medium (not shown) . 

Enhancement processor 203 performs the processing on 
received video signals required to enhance the video for 
display. Image or video enhancement is a broad area which 
may be roughly divided into three categories: restoration 
of "lost" (image/video) information; elimination of 
artifacts; and enhancement of selected image/video char- 
acteristics. Although the present invention is not limited 
to any particular category of video enhancement, for the 
purposes of simplicity resolution enhancement, which falls 
within the third category, will be utilized to describe and 
explain the invention. Nonetheless, those skilled in the 
art will understand that the invention may be readily 
adapted or extended to video enhancements other than 
resolution enhancement and falling within any of the three 
categories listed . 

Enhancement processor 203, together with enhancement 
vector estimator 202 in the exemplary embodiment, performs 
spatial resolution enhancement on the video information 
received. The technique for estimation of enhancement 
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vectors according to the present invention is similar to 
the recursive search block matching motion estimation 
process described in the references identified above. 

To perform video enhancement, enhancement vector 
estimator 202 includes one or more caches 205a-205n for 
temporary storage of pixel ' information relating to 
processing of a block of pixels, one or more block 
enhancement units 206a-206n, an enhancement vector memory 
207, and a best enhancement selection unit 208 which 
identifies and selects the best enhancement on a per block 
basis as described in further detail below. 

FIGURE 3 illustrates a logical organization of video 
information for video enhancement with spatio-temporal 
consistency according to one embodiment of the present 
invention. The organization depicted is employed for block 
enhancement by video signal processor 201 depicted in 
FIGURE 2. The video information to be enhanced includes a 
plurality of successive pictures (which may be either 
fields or frames) to be displayed in sequence at a 
predefined rate. "Successive," as used herein, refers to a 
subject picture being in consecutive series with another 
picture within the sequence, without regard to whether the 
subject picture is before or after the other picture within 
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the video information. A portion of the sequence of 
pictures, n-2, n-1, n, n+1 and n+2, is shown in FIGURE 3. 
Each picture comprises a two-dimensional array of pixels 
having coordinates (x,y) from the lower left corner of the 
picture, where the array function F(x,n) represents the 



pixel value at position x = 



and field number n within 



the video information at an initial (lower) spatial 
resolution. Each picture is logically divided into an 

array of blocks of pixels B(X) of a predetermined number of 

pixels in width and height and having a center X . The 
blocks or pixel regions may be rectangular as depicted or 
may be any other shape. 

Block enhancement units 206a-206n within video signal 
processor 201 enhance the received video information on a 
per block basis. As noted above, spatial resolution 
enhancement will be employed to explain the present 
invention. Specifically, an increase in the spatial 
resolution of the incoming video by a factor of two in both 
spatial dimensions of the fields will be employed to 
describe the present invention. 
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An initial estimate of higher spatial resolution video 
information G(x,n) based on the lower resolution video 
information F{x,ri) may be initially created by a simple 
spatial up-conversion--that is, a sample-rate conversion 
interpolation filter within block enhancement units 206a- 
206n is employed to obtain a higher resolution image. 

The down-conversion operation TQ which defines down- 
conversion of the high resolution video information G(x,n) 
to low resolution video information F(x,n) f given by 
F(x,n) -T(G(x,n)) , is employed in an error criterion for 

selecting the best enhancement of a given block B(X) . The 
error criterion, a measure for performance of the enhanced 
video information G(x,n) , is based on differences between 
the initial low resolution video information F{x,ri) and the 

low resolution video information F(x,n) obtained by down- 
converting the high resolution video information G{x,n) and 
is given by: 



xeB(X) 
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where C is a candidate for the enhancement vector 
V = {v 0 ,v lv ..,v m } consisting of coefficients which are utilized 
to create video information G(x,n) according to: 

m 

G(x,n) = r\(F(x,n)) + ^W^F^n))) . 

i 

W f () within the above equation indicates enhancement of the 
image data quality (where spatial resolution has already 
been enhanced by sample-rate conversion) by an algorithm / 
within a set of algorithms. For example, W Q (F(x,n)) could be 
the image data after frequency peaking while W x (F(x,n)) may 
be the result after luminance transient improvement. 

The penalty P x within the error function given above is 
a monotonic decreasing function of the norm of the 
enhancement vector V , introducing a large penalty for small 
coefficients and a small penalty for large coefficients. 
The penalty P 2 is employed to bias the enhancement vector V 
towards a spatial-temporally consistent solution since this 
penalty depends on the selected enhancement vector 

candidate C . Accordingly, the value of penalty P 2 is 
selected from a predefined list of penalty values which are 
optimized for the application. 
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Each enhancement vector candidate C is preferably 
selected from enhancement vectors previously determined to 
produce the smallest error function values for blocks 

within a spatio-temporal neighborhood around the block B(X) 
being processed. For example, one reference identified 
above suggests a M Y-prediction" estimator for recursive 
search block matching motion estimation, in which spatial 

prediction vector candidates C SF] and C SP2 are the vectors 
selected for blocks one block dimension above and to either 
side of and within the same field as the subject block B(X) 

while a temporal prediction candidate C Tl > is the vector 
selected for a block two blocks directly below and within 
the previous field n~\ from the field n containing the 
subject block B(X) . Selection of candidate enhancement 
vectors from the enhancement vectors which produced optimal 
results within the spatio-temporal neighborhood of the 

subject block B(X) speeds the process of determining the 
best enhancement (the enhancement vector which produces the 
smallest error, or other suitable criteria for enhancement 

results, for the subject block B(X)) since it is very 
likely that enhancement (s) similar to those producing the 
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best results for other blocks within the neighborhood of 
the subject block B(X) will produce the best results for 

the subject block B(X) . 

Alternatively, all possible candidate vectors of 
enhancement algorithms may be tested for each block. 
Moreover, the set of candidate vectors employed may change 
during processing of the video information, with, for 
example, all possible candidate vectors being tested for 
the first few fields of the video information and then a 
smaller subset of candidate vectors being employed for 
remaining fields, or with the selection of candidate 
vectors being otherwise refined as the video information is 
processed. Preferably one candidate is always updated with 
a random update vector. Several candidates may compete 
with each other, with the candidate yielding the smallest 

error s{C,X,n) being selected as the enhancement vector for 

the data within the subject block B(X) . 

As a result of the present invention, an enhancement 
vector which may be utilized with near-optimal results for 
spatial resolution up-conversion of a particular block is 
selected on a per-block basis. Spatio-temporal consistency 
is automatically achieved. Block erosion similar to, but 
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not restricted to, the process disclosed in the references 
identified above may be employed to prevent blocking 
artifacts . 

FIGURE! 4 is a high level flow chart for a process of 
video enhancement with spatio-temporal consistency 
according to one embodiment of the present invention. The 
process 400, performed by the video signal processor 202 
depicted in FIGURE 2 utilizing the logical organization of 
video information illustrated in FIGURE 3, begins with 
receipt (step 401) of video information for enhancement. 
As noted above, the process may be performed for various 
types of enhancements but spatial resolution enhancement 
will be employed to describe the process. 

A block within a current field of the received video 
information is first selected (step 402) and a simple 
enhancement, in this case sample rate conversion, is 
performed. The block is also enhanced utilizing each of a 
plurality of selected candidate enhancement vectors 
consisting one or more * enhancement algorithms employed 
jointly or individually, such as frequency peaking and 
luminance transient improvement. 

An error function value, where the error function 
includes a bias towards spatio-temporal consistency, is 
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then computed for each candidate enhancement vector (step 
404) and the enhancement corresponding to the candidate 
vector having the lowest error function value is selected 
(step 405) for display as part of the enhanced field. 
5 A determination as to whether all blocks within the 

current field have been processed (step 406) is then made, 
followed by selection and processing of a next block within 
the current field (step 407) if additional blocks remain 
i~ z and initiation of processing on the next field (step 408) 

iio if the current field has been completely processed. Once 

(3 initiated, the process proceeds until interrupted by an 

13 

|^ external influence, such as the receiver being turned off 

!_ or the reception of video information being interrupted. 

;= s JajP /H/f A The present invention allows enhancements to vide 

j ; 1/ tyV^ 



information (other than position within a repeatedL^fledd) 
to be processed in a manner inherently pp^dujcing spatio- 
temporally consistent results. Th^^rror /function employed 



to select the best er 
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-dement vector of enhancement 

algorithms is bia^d^towards spatio-temporal consistency by 
,^^a/ penalty increasing as candidate vectors 



addition yo 



r from a block being enhanced by either space, time, 



o2T both. As a result, the selected enhancement produces 
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Changes which are gradual over space and time and 
inherently free of spatio-temporal varying artifacts. 

It is important to note that while the present 
invention has been described in the context of a fully 
functional hard-ware based system and/or network, those 
skilled in the art will appreciate that the mechanism of 
the present invention is capable of being distributed in 
the form of a machine usable medium containing instructions 
in a variety of forms, and that the present invention 
applied equally regardless of the particular type of signal 
bearing medium utilized to actually carry out the 
distribution. Examples of machine usable mediums include: 
nonvolatile, hard-coded type mediums such as read only 
memories (ROMs) or erasable, electrically programmable read 
only memories (EEPROMs) , recordable type mediums such as 
floppy disks, hard disk drives and compact disc read only 
memories (CD-ROMs) or digital versatile discs (DVDs), and 
transmission type mediums such as digital and analog 
communication links . 

Although the present invention has been described in 
detail, those skilled in the art will understand that 
various changes, substitutions and alterations herein may 
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be made without departing from the spirit and scope of the 
invention in its broadest form. 
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